New force-alignment API and two-pass alignment to get phone/state durations #300

dhdaines · 2022-09-21T18:08:09Z

Now you can (relatively) easily do a second pass of alignment to get phone durations after decoding or word alignment.

Also, word alignment now uses FSG search, like SoundSwallower, so it's really fast and also handles silence and alternate pronunciations for you.

lenzo-ka · 2022-09-21T22:35:23Z

Excited to check this out! I'm at Interspeech and out of phase by half day and all, but I'll get a look shortly

dhdaines · 2022-09-21T22:42:38Z

No problem! The CLI for state alignment isn't quite there yet, but coming soon (tonight, I hope).

jsalsman · 2022-09-21T22:57:38Z

Fantastic! I also hope to try this out ASAP. I wonder whether constraining to the first pass's word boundaries will help. It seems like it can't hurt, but it would be interesting to measure how much.

…

On Wed, Sep 21, 2022 at 3:42 PM David Huggins-Daines < ***@***.***> wrote: No problem! The CLI for state alignment isn't quite there yet, but coming soon (tonight, I hope). — Reply to this email directly, view it on GitHub <#300 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZ4RVFMZXPP37UTRA5BSBTV7OFOXANCNFSM6AAAAAAQSKE6YM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

dhdaines · 2022-09-21T23:26:33Z

Fantastic! I also hope to try this out ASAP. I wonder whether constraining to the first pass's word boundaries will help. It seems like it can't hurt, but it would be interesting to measure how much.

It will definitely make the alignment faster. It may make it more accurate though I am not certain of this - I have to look at how I implemented this back in 2006: https://www.cs.cmu.edu/~dhuggins/Publications/phlab.pdf

EDIT: that paper was about forward-backward and not alignment, so not the same thing at all - in that case I implemented something like semi-Viterbi training, setting "impossible" phone sequences to zero probability, which resulted in models that were better for alignment (but somewhat worse for recognition)

Note that we *wont* do state alignment here for the moment as it is dubiously useful unless you are doing unsupervised MLLR, which should get a specific implementation

lenzo-ka

Hoping for state level alignments, and frame level scores also, but LGTM and WFM

dhdaines · 2022-09-26T22:37:01Z

Hoping for state level alignments, and frame level scores also, but LGTM and WFM

State level alignments are already there in the Python API, look at cython/test/alignment_test.py for an example, but it is now easy to add them to the command-line front-end as well, so I'll do that (not on by default though)

The bestpath search is not suitable for force-alignment, as it removes internal silences. It also sometimes produces bogus segments which are incompatible with state alignment and cause it to crash. For the moment you should never use bestpath search for alignment.

THEY WERE TOTALLY BOGUS OMG! The word IDs were not converted! Who did that?!?!?

dhdaines added 7 commits September 19, 2022 13:41

feat!: rearrange alignment API

b86243f

feat: implement soundswallower word alignment

0a6617e

test: test word alignment

12c68d9

fix: use ps_add_align_text

76c098c

feat!: state alignment should still be a search module

944853b

feat: implement second-pass state alignment

47c4008

test: minimal test of second-pass state alignment

ba2cf68

dhdaines assigned lenzo-ka and unassigned lenzo-ka Sep 21, 2022

dhdaines requested a review from lenzo-ka September 21, 2022 19:20

dhdaines added 8 commits September 21, 2022 17:47

fix: memory leaks

fb1692b

Merge branch 'master' into enhanced_alignment

73ebc02

refactor: unreverse inputs

e9a66be

feat: implement alignment in CLI

4e6857d

docs: better jq example

d023d91

Merge branch 'master' into enhanced_alignment

0f6c64c

test: add test of "pocketsphinx align"

82ec81f

test: make alignment test do something

914131f

dhdaines added 6 commits September 21, 2022 21:24

docs: clarify usage of phone loop

883c759

feat: allow ps_activate_search(NULL) to reset to default

2eb920b

fix: memory leak

016e89f

docs: document reactivating default search

f40358c

feat: implement phone alignment

94d90c8

Note that we *wont* do state alignment here for the moment as it is dubiously useful unless you are doing unsupervised MLLR, which should get a specific implementation

test: add phone align test

bf0092c

dhdaines marked this pull request as ready for review September 22, 2022 01:30

docs: update docs for align

6139299

dhdaines added 5 commits September 23, 2022 14:13

test: update to new API

3cd2727

fix: retain alignment properly and document

6dbc960

fix: retain alignment properly to avoid double free

2929504

feat: fill in durations in ps_get_alignment()

e622a3d

feat: constrain alignments to word boundaries

954d16c

lenzo-ka approved these changes Sep 26, 2022

View reviewed changes

dhdaines added 21 commits September 27, 2022 09:29

feat: only activate hmms inside state align constraints

3f20080

fix: double free in state align seg

52168c0

fix: base words in state align hyp

cf39804

test: much more comprehensive alignment test

7c03d6e

fix: really enforce alignment constraints

51ae88d

fix: use probs from alignment in output

7b76ae0

fix: produce less bogus segmentations in fsg_search

a1e4873

THEY WERE TOTALLY BOGUS OMG! The word IDs were not converted! Who did that?!?!?

feat: fsg bestpath start/end are rather approximate

956df67

refactor: fix whitespace

88ba051

fix: remove fillers from hyp

381058e

test: no bestpath (it crashes on this utt)

be06d67

test: append to log

8f9a764

test: updated (correct) phone/word timings

c4609d1

fix: require an input or - for stdin (fixes #301)

a4a765b

fix: soxflags takes no inputs

d55a1d4

test: add missing -

e613496

test: some minor unexplained differences, do not care

90b3b68

fix: memory leak

ed8acbd

feat: add -state_align to get state alignments

fbf398a

test: also test state alignments

6dba1b2

dhdaines merged commit 68c5db8 into master Sep 27, 2022

dhdaines deleted the enhanced_alignment branch September 28, 2022 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New force-alignment API and two-pass alignment to get phone/state durations #300

New force-alignment API and two-pass alignment to get phone/state durations #300

dhdaines commented Sep 21, 2022 •

edited

Loading

lenzo-ka commented Sep 21, 2022

dhdaines commented Sep 21, 2022

jsalsman commented Sep 21, 2022 via email

dhdaines commented Sep 21, 2022 •

edited

Loading

lenzo-ka left a comment

dhdaines commented Sep 26, 2022

New force-alignment API and two-pass alignment to get phone/state durations #300

New force-alignment API and two-pass alignment to get phone/state durations #300

Conversation

dhdaines commented Sep 21, 2022 • edited Loading

lenzo-ka commented Sep 21, 2022

dhdaines commented Sep 21, 2022

jsalsman commented Sep 21, 2022 via email

dhdaines commented Sep 21, 2022 • edited Loading

lenzo-ka left a comment

Choose a reason for hiding this comment

dhdaines commented Sep 26, 2022

dhdaines commented Sep 21, 2022 •

edited

Loading

dhdaines commented Sep 21, 2022 •

edited

Loading